Introduce configuration to utilize nvidia GPU on dockerIM#494
Introduce configuration to utilize nvidia GPU on dockerIM#4940405ysj wants to merge 1 commit intogoogle:mainfrom
Conversation
| im = instances.NewDockerInstanceManager(config.InstanceManager, cli) | ||
| im, err = instances.NewDockerInstanceManager(config.InstanceManager, cli) | ||
| if err != nil { | ||
| log.Fatal("Failed to create Docker Instance Manager: ", err) |
There was a problem hiding this comment.
This function should return (instances.Manager, error) if it can fail like this.
There was a problem hiding this comment.
I used log.Fatal to keep consistency, as other place in this file does. Would you want me to refactor here?
|
|
||
| type DockerIMConfig struct { | ||
| DockerImageName string | ||
| GpuManufacturer string |
There was a problem hiding this comment.
Avoid terms like manufacturer that are not part of the docker documentation https://docs.docker.com/desktop/features/gpu. Try to use similar names used in the documentation that docker users are already familiar with
There was a problem hiding this comment.
My suggestion is equivalant to --env NVIDIA_DRIVER_CAPABILITIES=all --gpus all --runtime nvidia in terms of executing docker run, and I don't think there's a proper name to represent my purpose on docker documentation.
--env and --runtime looks fine to expose into CO configuration, but --gpus in docker run directly specifies GPU allocation. I think --gpus shouldn't be exposed into CO configuration, since DockerIM runs multiple docker instances and we probably want to consider GPU allocation by CO later. I don't want to design how DockerIM allocates GPU right now, as it's pretty complicated to consider details of nvidia GPUs.
So, I need to define a new name to pass information whether DockerIM will utilize GPU or not. Retrieving such information by parsing --env or --runtime is appropriate. Or, considering boolean configuration such as UseNvidiaGpu looks valid to me. If CO configuration for GPU is representative enough to set --env or --runtime, I think we don't need to define new configurations in advance which can make compatibility issue in the far future.
|
|
||
| type DockerIMConfig struct { | ||
| DockerImageName string | ||
| GpuManufacturer string |
|
|
||
| type DockerIMConfig struct { | ||
| DockerImageName string | ||
| GpuManufacturer string |
There was a problem hiding this comment.
The ability to create docker instances using GPU should be part of the CO public API, not hidden as a CO configuration. Please explain your case why do you want to hide this ability from the end users.
For reference, the ability to add accelerators is part of the public API for GCE hosts. See #319. Also the gcloud and docker CLIs follow the same principle. Going the opposite way here should be properly justified.
There was a problem hiding this comment.
I cannot say it's under same principle because of the way how docker run --runtime works. Valid values of --runtime flag relies on the dockerd configuration setup. At least, this isn't proper to be exposed into cvdr CLI, but should be in CO configuration.
$ sudo nvidia-ctk runtime configure --runtime=docker # Modifies /etc/docker/daemon.json with adding a new runtime.
$ cat /etc/docker/daemon.json
{
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
}
}
$ sudo systemctl restart docker
# Then users can execute `docker run --runtime nvidia [args]`
On the other hand, I think it's a bit complicated to make agreement from here... I'll propose a design around GPU utilization when I have time, perhaps with GPU allocation mechanism too.
Context: b/455678690
With configuring Cloud Orchestrator as below, Cloud Orchestrator with dockerIM can utilize nvidia GPU.